269 research outputs found

    Robust variable screening for regression using factor profiling

    Full text link
    Sure Independence Screening is a fast procedure for variable selection in ultra-high dimensional regression analysis. Unfortunately, its performance greatly deteriorates with increasing dependence among the predictors. To solve this issue, Factor Profiled Sure Independence Screening (FPSIS) models the correlation structure of the predictor variables, assuming that it can be represented by a few latent factors. The correlations can then be profiled out by projecting the data onto the orthogonal complement of the subspace spanned by these factors. However, neither of these methods can handle the presence of outliers in the data. Therefore, we propose a robust screening method which uses a least trimmed squares method to estimate the latent factors and the factor profiled variables. Variable screening is then performed on factor profiled variables by using regression MM-estimators. Different types of outliers in this model and their roles in variable screening are studied. Both simulation studies and a real data analysis show that the proposed robust procedure has good performance on clean data and outperforms the two nonrobust methods on contaminated data

    Empirical comparison of the performance of location estimates of fuzzy number-valued data

    Get PDF
    © Springer Nature Switzerland AG 2019. Several location measures have already been proposed in the literature in order to summarize the central tendency of a random fuzzy number in a robust way. Among them, fuzzy trimmed means and fuzzy M-estimators of location extend two successful approaches from the real-valued settings. The aim of this work is to present an empirical comparison of different location estimators, including both fuzzy trimmed means and fuzzy M-estimators, to study their differences in finite sample behaviour.status: publishe

    Robust bootstrap procedures for the chain-ladder method

    Get PDF
    Insurers are faced with the challenge of estimating the future reserves needed to handle historic and outstanding claims that are not fully settled. A well-known and widely used technique is the chain-ladder method, which is a deterministic algorithm. To include a stochastic component one may apply generalized linear models to the run-off triangles based on past claims data. Analytical expressions for the standard deviation of the resulting reserve estimates are typically difficult to derive. A popular alternative approach to obtain inference is to use the bootstrap technique. However, the standard procedures are very sensitive to the possible presence of outliers. These atypical observations, deviating from the pattern of the majority of the data, may both inflate or deflate traditional reserve estimates and corresponding inference such as their standard errors. Even when paired with a robust chain-ladder method, classical bootstrap inference may break down. Therefore, we discuss and implement several robust bootstrap procedures in the claims reserving framework and we investigate and compare their performance on both simulated and real data. We also illustrate their use for obtaining the distribution of one year risk measures

    Enhanced analysis of real-time PCR data by using a variable efficiency model : FPK-PCR

    Get PDF
    Current methodology in real-time Polymerase chain reaction (PCR) analysis performs well provided PCR efficiency remains constant over reactions. Yet, small changes in efficiency can lead to large quantification errors. Particularly in biological samples, the possible presence of inhibitors forms a challenge. We present a new approach to single reaction efficiency calculation, called Full Process Kinetics-PCR (FPK-PCR). It combines a kinetically more realistic model with flexible adaptation to the full range of data. By reconstructing the entire chain of cycle efficiencies, rather than restricting the focus on a 'window of application', one extracts additional information and loses a level of arbitrariness. The maximal efficiency estimates returned by the model are comparable in accuracy and precision to both the golden standard of serial dilution and other single reaction efficiency methods. The cycle-to-cycle changes in efficiency, as described by the FPK-PCR procedure, stay considerably closer to the data than those from other S-shaped models. The assessment of individual cycle efficiencies returns more information than other single efficiency methods. It allows in-depth interpretation of real-time PCR data and reconstruction of the fluorescence data, providing quality control. Finally, by implementing a global efficiency model, reproducibility is improved as the selection of a window of application is avoided

    Simulation of between repeat variability in real time PCR reactions

    Get PDF
    While many decisions rely on real time quantitative PCR (qPCR) analysis few attempts have hitherto been made to quantify bounds of precision accounting for the various sources of variation involved in the measurement process. Besides influences of more obvious factors such as camera noise and pipetting variation, changing efficiencies within and between reactions affect PCR results to a degree which is not fully recognized. Here, we develop a statistical framework that models measurement error and other sources of variation as they contribute to fluorescence observations during the amplification process and to derived parameter estimates. Evaluation of reproducibility is then based on simulations capable of generating realistic variation patterns. To this end, we start from a relatively simple statistical model for the evolution of efficiency in a single PCR reaction and introduce additional error components, one at a time, to arrive at stochastic data generation capable of simulating the variation patterns witnessed in repeated reactions (technical repeats). Most of the variation in C-q values was adequately captured by the statistical model in terms of foreseen components. To recreate the dispersion of the repeats' plateau levels while keeping the other aspects of the PCR curves within realistic bounds, additional sources of reagent consumption (side reactions) enter into the model. Once an adequate data generating model is available, simulations can serve to evaluate various aspects of PCR under the assumptions of the model and beyond

    On the consistency of a spatial-type interval-valued median for random intervals

    Full text link
    The sample dθd_\theta-median is a robust estimator of the central tendency or location of an interval-valued random variable. While the interval-valued sample mean can be highly influenced by outliers, this spatial-type interval-valued median remains much more reliable. In this paper, we show that under general conditions the sample dθd_\theta-median is a strongly consistent estimator of the dθd_\theta-median of an interval-valued random variable.Comment: 14 page
    corecore